Methods and Attributes¶
Remember
Methods ends with parentheses, while attributes don’t
df.shape: Attribute
df.info(): Method
# import pandas
import pandas as pd
# read a dataset of top-rated IMDb movies into a DataFrame
movies = pd.read_csv('http://bit.ly/imdbratings')
# example method: show the first 5 rows
movies.head()
| star_rating | title | content_rating | genre | duration | actors_list | |
|---|---|---|---|---|---|---|
| 0 | 9.3 | The Shawshank Redemption | R | Crime | 142 | [u'Tim Robbins', u'Morgan Freeman', u'Bob Gunt... |
| 1 | 9.2 | The Godfather | R | Crime | 175 | [u'Marlon Brando', u'Al Pacino', u'James Caan'] |
| 2 | 9.1 | The Godfather: Part II | R | Crime | 200 | [u'Al Pacino', u'Robert De Niro', u'Robert Duv... |
| 3 | 9.0 | The Dark Knight | PG-13 | Action | 152 | [u'Christian Bale', u'Heath Ledger', u'Aaron E... |
| 4 | 8.9 | Pulp Fiction | R | Crime | 154 | [u'John Travolta', u'Uma Thurman', u'Samuel L.... |
# example method: calculate summary statistics
movies.describe()
| star_rating | duration | |
|---|---|---|
| count | 979.000000 | 979.000000 |
| mean | 7.889785 | 120.979571 |
| std | 0.336069 | 26.218010 |
| min | 7.400000 | 64.000000 |
| 25% | 7.600000 | 102.000000 |
| 50% | 7.800000 | 117.000000 |
| 75% | 8.100000 | 134.000000 |
| max | 9.300000 | 242.000000 |
# example attribute: number of rows and columns
movies.shape
(979, 6)
# example attribute: data type of each column
movies.dtypes
star_rating float64
title object
content_rating object
genre object
duration int64
actors_list object
dtype: object
# use an optional parameter to the describe method to summarize only 'object' column
movies.describe(include='object')
| title | content_rating | genre | actors_list | |
|---|---|---|---|---|
| count | 979 | 976 | 979 | 979 |
| unique | 975 | 12 | 16 | 969 |
| top | True Grit | R | Drama | [u'Daniel Radcliffe', u'Emma Watson', u'Rupert... |
| freq | 2 | 460 | 278 | 6 |